Identifying Word Boundaries in Handwritten Text

نویسندگان

  • Yi Sun
  • Timothy S. Butler
  • Alex Shafarenko
  • Rod Adams
  • Martin Loomes
  • Neil Davey
چکیده

Recent work on extracting features of gaps in handwritten text allows a classification of these gaps into inter-word and intra-word classes using suitable classification techniques. In the previous work, we apply different supervised classification algorithms from the machine learning field on both the original gap dataset and the gap dataset with the best features selected using mutual information. In this paper, we improve the classification result with the aid of a set of feature variables of strokes preceding and following each gap. The best classification result attained suggests that the technique we employ is particularly suitable for digital ink manipulation at the level of words.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Connected Component Based Word Spotting on Persian Handwritten image documents

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

Off-line Cursive Handwritten Word Segmentation, A new approach

The segmentation of off-line cursive handwritten word is an important step in cursive handwriting recognition. In this paper a new, simple yet effective approach is proposed. Proposed technique is based on the analysis of the ligatures of the characters in the cursive word. The only preprocessing is to skeleton the word to allow variations in pen thickness and tilt in writing. There is no const...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

The Segmentation and Identification of Handwriting in Noisy Document Images

In this paper we present an approach to the problem of segmenting and identifying handwritten annotations on noisy document images. In many types of documents such as correspondence, it is not uncommon for handwritten annotations to be added as part of a note, correction, clarification, or instruction, or for initials or a signature to appear as an authentication mark. It is important to be abl...

متن کامل

Word Segmentation of Handwritten Dates in Historical Documents by Combining Semantic A-Priori-Knowledge with Local Features

The recognition of script in historical documents requires suitable techniques in order to identify single words. Segmentation of lines and words is a challenging task because lines are not straight and words may intersect within and between lines. For correct word segmentation, the conventional analysis of distances between text objects needs to be supplemented by a second component predicting...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004